A New Measure of the Cluster Hypothesis

نویسندگان

  • Mark D. Smucker
  • James Allan
چکیده

We have found that the nearest neighbor (NN) test is an insufficient measure of the cluster hypothesis. The NN test is a local measure of the cluster hypothesis. Designers of new document-to-document similarity measures may incorrectly report effective clustering of relevant documents if they use the NN test alone. Utilizing a measure from network analysis, we present a new, global measure of the cluster hypothesis: normalized mean reciprocal distance. When used together with a local measure, such as the NN test, this new global measure allows researchers to better measure the cluster hypothesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Investigation of Convergence Hypothesis of Price Index in Asian Stock Markets

The capital market in each country is considered as the most important part of the economy and its fluctuations may reflect the economic situation of the country. In this paper, the hypothesis of convergence of stock market price indices in Asian countries during the period from January 2007 to February 2017 is investigated using cluster analysis method. The results show that there is no eviden...

متن کامل

Comparison of Portfolios Formed by Use of Grid Strategy Model Based on New and Traditional Variables Performance With Sharpe and Treynor Measures (Evidence of IRAN Exchange)

In this research, performance of portfolios formed by use of grid strategy based on new variables (aggressive, indifference and defensive stocks) presented by Rahnamaye Roodposhti (1388), and traditional ones (growth, growth-value and value stocks), calculated with Sharpe and Treynor performance measures and tested by an Active portfolio management approach to identify the portfolios by perform...

متن کامل

The Cancer Stem Cell Hypothesis in Oral Squamous Cell Carcinoma: A New Target for the Treatment

Within a single tumor clone, cells have significantly different abilities to proliferate and form new tumors. This has led to the hypothesis that most cells in a cancer have a limited ability to divide and only a small subset of distinct cells, the cancer stem cells (CSCs), has the capacity to self-renew and form new tumors . It has been proposed that the development of tumors is based exclusiv...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009